Cross-component Clustering for Template Induction

نویسندگان

  • Zvika Marx
  • Eli Shamir
چکیده

We suggest an unsupervised approach to template induction for information extraction, through detecting sub-topics and themes that cut across the documents of a topical corpus. We introduce a new method – cross component clustering – that simultaneously clusters the components forming our setting, each of which consists of the words of a single article. Our algorithm is derived from the Information Bottleneck clustering algorithm. The resulting clusters are found to be in systematic correspondence with sets of terms that are used in filling the slots of the MUC3/4 ready-made template, which was used for evaluation.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using Clustering and Factor Analysis in Cross Section Analysis Based on Economic-Environment Factors

Homogeneity of groups in studies those use cross section and multi-level data is important. Most studies in economics especially panel data analysis need some kinds of homogeneity to ensure validity of results. This paper represents the methods known as clustering and homogenization of groups in cross section studies based on enviro-economics components. For this, a sample of 92 countries which...

متن کامل

Synthesis and Evaluating of Nanoporous Molecularly Imprinted Polymers for Extraction of Quercetin as a Bioactive Component of Medicinal Plants

In this work, the template, monomer, and cross-linker with the ratio of 1:8:40 were used to synthesize Molecularly Imprinted Polymers (MIPs) for extraction of the bioactive chemical compounds from some traditional herbs as a sorbent material. Quercetin, Methacrylic Acid (MAA), Trimethylolpropanetrimethacrylate (TRIM) and Tetrahydrofuran (THF) were used as a template, funct...

متن کامل

Unmixed Spectrum Clustering for Template Composition in Lung Sound Classification

In this paper, we propose a method for composing templates of lung sound classification. First, we obtain a sequence of power spectra by FFT for each given lung sound and compute a small number of component spectra by ICA for each of the overlapping sets of tens of consecutive power spectra. Second, we put component spectra obtained from various lung sounds into a single set and conduct cluster...

متن کامل

Template Matching using Statistical Model and Parametric Template for Multi-Template

This paper represents a template matching using statistical model and parametric template for multi-template. This algorithm consists of two phases: training and matching phases. In the training phase, the statistical model created by principal component analysis method (PCA) can be used to synthesize multi-template. The advantage of PCA is to reduce the variances of multi-template. In the matc...

متن کامل

Evaluation of Similarity Measures for Template Matching

Image matching is a critical process in various photogrammetry, computer vision and remote sensing applications such as image registration, 3D model reconstruction, change detection, image fusion, pattern recognition, autonomous navigation, and digital elevation model (DEM) generation and orientation. The primary goal of the image matching process is to establish the correspondence between two ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002